mlr3 Tutorial

Bernd Bischl & Marvin N. Wright

Introduction and Overview

mlr3 book

Our book is the central entry point to mlr3 ecosystem

This tutorial follows the book closely

Extension Packages

mlr3verse

The mlr3verse package contains all important packages of the mlr3 ecosystem

library(mlr3verse)

mlr3 Philosophy

  • Object-oriented programming

  • Tabular data

  • Unified tabular input and output data formats

  • Defensive programming and type safety

  • Light on dependencies

  • Separation of computation and presentation

R6 Classes

R’s more recent paradigms for object-oriented programming

Instances of an R6 class are created by using $new()

foo = Foo$new(bar = 1)

In practice often replaced by sugar functions

as_task_regr()

R6 - Fields

R6 objects may have mutable states that are encapsulated in their fields

Can be accessed and modified through the dollar $ operator

foo$bar = 2

Fields can also be ‘active bindings’, which perform additional computations when referenced or modified.

R6 - Methods

R6 objects have methods that are functions that are associated with the object

Methods change the internal state of the objects

learner$train()

Or retrieve information about the object

R6 - Reference Semantics

R6 objects are environments

# does not create a copy foo
foo2 = foo

# also changes the value of bar in foo2
foo$bar = 3

# creates a copy of foo
foo3 = foo$clone(deep = TRUE)

Sugar Functions

Most mlr3 objects are created with sugar functions

Reduces the amount of code a user has to write

For example, lrn() creates a learner object without having to use $new()

lrn("regr.rpart")

Dictionaries

R6 classes are stored in dictionaries

Associates keys with objects

mlr_tasks
<DictionaryTask> with 22 stored values
Keys: ames_housing, bike_sharing, boston_housing, breast_cancer,
  california_housing, german_credit, ilpd, iris, kc_housing, moneyball,
  mtcars, optdigits, penguins, penguins_simple, pima, ruspini, sonar,
  spam, titanic, usarrests, wine, zoo

Use sugar functions to retrieve objects from dictionaries

tsk("pima")
<TaskClassif:pima> (768 x 9): Pima Indian Diabetes
* Target: diabetes
* Properties: twoclass
* Features (8):
  - dbl (8): age, glucose, insulin, mass, pedigree, pregnant, pressure,
    triceps

Data and Basic Modeling

Machine Learning Process

Predefined Tasks

mlr3 includes a few predefined tasks

Stored in the mlr_tasks dictionary

mlr_tasks
<DictionaryTask> with 22 stored values
Keys: ames_housing, bike_sharing, boston_housing, breast_cancer,
  california_housing, german_credit, ilpd, iris, kc_housing, moneyball,
  mtcars, optdigits, penguins, penguins_simple, pima, ruspini, sonar,
  spam, titanic, usarrests, wine, zoo

Load Predefined Task

To get a task from the dictionary, use the tsk() function

tsk_mtcars = tsk("mtcars")
tsk_mtcars
<TaskRegr:mtcars> (32 x 11): Motor Trends
* Target: mpg
* Properties: -
* Features (10):
  - dbl (10): am, carb, cyl, disp, drat, gear, hp, qsec, vs, wt

Constructing Tasks

Construct regression task with the as_task_regr() function

data("mtcars", package = "datasets")
tsk_mtcars = as_task_regr(mtcars, target = "mpg", id = "cars")
tsk_mtcars
<TaskRegr:cars> (32 x 11)
* Target: mpg
* Properties: -
* Features (10):
  - dbl (10): am, carb, cyl, disp, drat, gear, hp, qsec, vs, wt

Task Mutators

Keep only one feature

tsk_mtcars_small = tsk("mtcars") # initialize with the full task
tsk_mtcars_small$select("cyl")

Keep only these rows

tsk_mtcars_small$filter(2:3)
tsk_mtcars_small$data()
     mpg   cyl
   <num> <num>
1:  21.0     6
2:  22.8     4

Help Pages

Help pages of functions can be queried with ?

Help page of the mtcars task you could use ?mlr_tasks_mtcars

$help() allows you to access the help page from any instance of that class

tsk("mtcars")$help()

Learners

Unified interface to many popular ml algorithms

Access learners from the dictionary with lrn()

lrn("regr.rpart")
<LearnerRegrRpart:regr.rpart>: Regression Tree
* Model: -
* Parameters: xval=0
* Packages: mlr3, rpart
* Predict Types:  [response]
* Feature Types: logical, integer, numeric, factor, ordered
* Properties: importance, missings, selected_features, weights

Learner Metadata

  • $feature_types: type of features the learner can handle
  • $packages: packages required to be installed to use the learner
  • $properties: properties of the learner e.g. the “missings” property
  • $predict_types: types of prediction that the model can make
  • $param_set: set of available hyperparameters

Learner Stages

Training

# load mtcars task
tsk_mtcars = tsk("mtcars")
# load a regression tree
lrn_rpart = lrn("regr.rpart")
# pass the task to the learner via $train()
lrn_rpart$train(tsk_mtcars)
# inspect the trained model
lrn_rpart$model
n= 32 

node), split, n, deviance, yval
      * denotes terminal node

1) root 32 1126.04700 20.09062  
  2) cyl>=5 21  198.47240 16.64762  
    4) hp>=192.5 7   28.82857 13.41429 *
    5) hp< 192.5 14   59.87214 18.26429 *
  3) cyl< 5 11  203.38550 26.66364 *

Partitioning Data

Randomly split the given task into two disjoint sets

splits = partition(tsk_mtcars)
splits
$train
 [1]  3  5  7  8  9 10 11 14 15 16 17 22 23 24 25 26 27 28 29 31 32

$test
 [1]  1  2  4  6 12 13 18 19 20 21 30

$validation
integer(0)

Train the learner on the training set

lrn_rpart$train(tsk_mtcars, row_ids = splits$train)

Predicting

Predict from trained model

prediction = lrn_rpart$predict(tsk_mtcars, row_ids = splits$test)

Returns a Prediction object

prediction
<PredictionRegr> for 11 observations:
 row_ids truth response
       1  21.0 15.33571
       2  21.0 15.33571
       4  21.4 15.33571
     ---   ---      ---
      20  33.9 25.01429
      21  21.5 25.01429
      30  19.7 25.01429

Get tabular form:

as.data.table(prediction)

Hyperparameters

Affect how the learner is run. Represented as ParamSet object:

lrn_rpart$param_set
<ParamSet(10)>
                id    class lower upper nlevels        default  value
            <char>   <char> <num> <num>   <num>         <list> <list>
 1:             cp ParamDbl     0     1     Inf           0.01 [NULL]
 2:     keep_model ParamLgl    NA    NA       2          FALSE [NULL]
 3:     maxcompete ParamInt     0   Inf     Inf              4 [NULL]
 4:       maxdepth ParamInt     1    30      30             30 [NULL]
 5:   maxsurrogate ParamInt     0   Inf     Inf              5 [NULL]
 6:      minbucket ParamInt     1   Inf     Inf <NoDefault[0]> [NULL]
 7:       minsplit ParamInt     1   Inf     Inf             20 [NULL]
 8: surrogatestyle ParamInt     0     1       2              0 [NULL]
 9:   usesurrogate ParamInt     0     2       3              2 [NULL]
10:           xval ParamInt     0   Inf     Inf             10      0

This defines the configuration space and contains the actual hyperparameter values.

Parameter Classes

Class of the parameter and the possible values

Hyperparameter Class Hyperparameter Type
ParamDbl Real-valued (numeric)
ParamInt Integer
ParamFct Categorical (factor)
ParamLgl Logical / Boolean
ParamUty Untyped

Getting and Setting Hyperparameters

During construction

lrn_rpart = lrn("regr.rpart", maxdepth = 1)
lrn_rpart$param_set$values
$maxdepth
[1] 1

$xval
[1] 0

Updating after construction

lrn_rpart$param_set$set_values(xval = 2, cp = 0.5)

Evaluation

Evaluating the models performance

Perhaps the most important step of the applied machine learning

Quantify the accuracy of the model’s predictions

We continue with the code from the previous slides

lrn_rpart = lrn("regr.rpart")
tsk_mtcars = tsk("mtcars")
splits = partition(tsk_mtcars)
lrn_rpart$train(tsk_mtcars, splits$train)
prediction = lrn_rpart$predict(tsk_mtcars, splits$test)

Measure

Quality of predictions is evaluated using measures

Access measures from the dictionary with msr()

as.data.table(msr())[c(3, 4, 6, 7, 17, 45, 55),
  .(key, label, task_type, predict_type)]
Key: <key>
           key                   label task_type predict_type
        <char>                  <char>    <char>       <char>
1:          ci              Default CI      <NA>     response
2:    ci.con_z Conservative-Z Interval      <NA>     response
3:  ci.holdout        Holdout Interval      <NA>     response
4:      ci.ncv      Nested CV Interval      <NA>     response
5: classif.fdr    False Discovery Rate   classif     response
6:   clust.wss   Within Sum of Squares     clust    partition
7:  regr.medse    Median Squared Error      regr     response

Measure

Mean absolute error

\(f(y, \hat{y}) = | y - \hat{y} |\)

measure = msr("regr.mae")
measure
<MeasureRegrSimple:regr.mae>: Mean Absolute Error
* Packages: mlr3, mlr3measures
* Range: [0, Inf]
* Minimize: TRUE
* Average: macro
* Parameters: list()
* Properties: -
* Predict type: response

Scoring Predictions

Score the predictions with the mean absolute error

prediction$score(measure)
regr.mae 
3.234266 

Classification

Classification problems are ones in which a model predicts a discrete, categorical target

The interface is similar as possible to regression

set.seed(349)
# load and partition our task
tsk_penguins = tsk("penguins")
splits = partition(tsk_penguins)
# load decision tree and set hyperparameters
lrn_rpart = lrn("classif.rpart", cp = 0.2, maxdepth = 5)
# load accuracy measure
measure = msr("classif.acc")
# train learner
lrn_rpart$train(tsk_penguins, splits$train)
# make and score predictions
lrn_rpart$predict(tsk_penguins, splits$test)$score(measure)
classif.acc 
  0.9473684 

Binary Classification Tasks

The sonar task is an example of a binary classification problem

In mlr3 terminology it has the “twoclass” property

tsk_sonar = tsk("sonar")
tsk_sonar
tsk_sonar$class_names

Multiclass Classification Tasks

tsk("penguins") is a multiclass problem as there are more than two species of penguins; it has the “multiclass” property

tsk_penguins = tsk("penguins")
tsk_penguins$properties
tsk_penguins$class_names

Classification Predictions

Predictions in classification are either "response" – predicting an observation’s class or "prob" – predicting a vector of probabilities of an observation belonging to each class

lrn_rpart = lrn("classif.rpart", predict_type = "prob")
lrn_rpart$train(tsk_penguins, splits$train)
prediction = lrn_rpart$predict(tsk_penguins, splits$test)
prediction
<PredictionClassif> for 114 observations:
 row_ids     truth  response prob.Adelie prob.Chinstrap prob.Gentoo
       1    Adelie    Adelie  0.97029703     0.02970297   0.0000000
       2    Adelie    Adelie  0.97029703     0.02970297   0.0000000
       3    Adelie    Adelie  0.97029703     0.02970297   0.0000000
     ---       ---       ---         ---            ---         ---
     338 Chinstrap Chinstrap  0.04255319     0.93617021   0.0212766
     339 Chinstrap Chinstrap  0.04255319     0.93617021   0.0212766
     342 Chinstrap Chinstrap  0.04255319     0.93617021   0.0212766

Classification Measures

To evaluate "response" predictions, you will need measures with predict_type = "response"

To evaluate probability predictions you will need predict_type = "prob"

measures = msrs(c("classif.mbrier", "classif.logloss", "classif.acc"))
prediction$score(measures)
 classif.mbrier classif.logloss     classif.acc 
      0.1028996       0.7548078       0.9385965 

Confusion Matrix

The rows in a confusion matrix are the predicted class and the columns are the true class

All off-diagonal entries are incorrectly classified observations, and all diagonal entries are correctly classified

prediction$confusion
           truth
response    Adelie Chinstrap Gentoo
  Adelie        48         2      0
  Chinstrap      4        14      1
  Gentoo         0         0     45

Confusion Matrix

You can visualize the predicted class labels with autoplot.PredictionClassif()

autoplot(prediction)

Thresholding

The default response prediction type is the class with the highest predicted probability

task_credit = tsk("german_credit")
split = partition(task_credit)

lrn_rpart = lrn("classif.rpart", predict_type = "prob")
lrn_rpart$train(task_credit, split$train)
prediction = lrn_rpart$predict(task_credit, split$test)
prediction$score(msr("classif.acc"))
classif.acc 
  0.7030303 
prediction$confusion
        truth
response good bad
    good  187  71
    bad    27  45

Thresholding

In binary classification, the positive class will be selected if the predicted class is greater than 50%, and the negative class otherwise

Useful to change threshold if class imbalance, different costs associated with classes, or preference to ‘over’-predict one class

prediction$set_threshold(0.7)
prediction$score(msr("classif.acc"))
classif.acc 
  0.7060606 
prediction$confusion
        truth
response good bad
    good  183  66
    bad    31  50

Imbalance Metrics

prediction$set_threshold(0.2)
prediction$score(msrs(c("classif.tpr", "classif.ppv", "classif.fbeta")))
  classif.tpr   classif.ppv classif.fbeta 
    0.9579439     0.6788079     0.7945736 
prediction$set_threshold(0.5)
prediction$score(msrs(c("classif.tpr", "classif.ppv", "classif.fbeta")))
  classif.tpr   classif.ppv classif.fbeta 
    0.8738318     0.7248062     0.7923729 
prediction$set_threshold(0.8)
prediction$score(msrs(c("classif.tpr", "classif.ppv", "classif.fbeta")))
  classif.tpr   classif.ppv classif.fbeta 
    0.7476636     0.7339450     0.7407407 

Threshold vs. Performance

autoplot(prediction, measure = msr("classif.tpr"), type = "threshold")

ROC Curve

autoplot(prediction, type = "roc")

Summary

task = tsk("spam")
learner = lrn("classif.rpart")

splits = partition(task)

learner$train(task, row_ids = splits$train)

prediction = learner$predict(task, row_ids = splits$test)
prediction

prediction$score(msr("classif.acc"))

prediction$confusion

Evaluation and Benchmarking

Resampling

Resampling Strategy

Resampling strategies repeatedly split all available data into multiple training and test sets

Access resampling strategy from the dictionary with rsmp()

as.data.table(rsmp())[, .(key, label)]
Key: <key>
                   key                         label
                <char>                        <char>
 1:          bootstrap                     Bootstrap
 2:             custom                 Custom Splits
 3:          custom_cv Custom Split Cross-Validation
 4:                 cv              Cross-Validation
 5:            holdout                       Holdout
 6:           insample           Insample Resampling
 7:                loo                 Leave-One-Out
 8:          nested_cv                     Nested CV
 9: paired_subsampling            Paired Subsampling
10:        repeated_cv     Repeated Cross-Validation
11:        subsampling                   Subsampling

Cross Validation

Resampling Strategy

Holdout method

holdout = rsmp("holdout", ratio = 0.8)

3-fold CV

cv3 = rsmp("cv", folds = 3)

Subsampling with 3 repeats and 9/10 ratio

ss390 = rsmp("subsampling", repeats = 3, ratio = 0.9)

2-repeats 5-fold CV

rcv25 = rsmp("repeated_cv", repeats = 2, folds = 5)

Resampling Experiments

resample() repeatedly fits a model on training sets, makes predictions on the corresponding test sets

Stores them in a ResampleResult object

tsk_penguins = tsk("penguins")
lrn_rpart = lrn("classif.rpart")

rr = resample(tsk_penguins, lrn_rpart, cv3, store_models = TRUE)
rr
<ResampleResult> with 3 resampling iterations
  task_id    learner_id resampling_id iteration     prediction_test warnings
 penguins classif.rpart            cv         1 <PredictionClassif>        0
 penguins classif.rpart            cv         2 <PredictionClassif>        0
 penguins classif.rpart            cv         3 <PredictionClassif>        0
 errors
      0
      0
      0

Score and Aggregate

Score Resample Result

We can calculate the score for each iteration with $score()

acc = rr$score(msr("classif.ce"))
acc[, .(iteration, classif.ce)]
   iteration classif.ce
       <int>      <num>
1:         1 0.02608696
2:         2 0.08695652
3:         3 0.07017544

Aggregate Resample Result

$aggregate() returns the aggregated score across all resampling iterations

rr$aggregate(msr("classif.ce"))
classif.ce 
0.06107297 

ResampleResult Objects

Prediction object for each resampling iteration

rr$predictions()[[1]]
<PredictionClassif> for 115 observations:
 row_ids     truth  response
       4    Adelie    Adelie
       5    Adelie    Adelie
       6    Adelie    Adelie
     ---       ---       ---
     330 Chinstrap Chinstrap
     335 Chinstrap Chinstrap
     344 Chinstrap Chinstrap

ResampleResult Objects

Can also be used for model inspection

rr$learners[[1]]$model
n= 229 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 229 132 Adelie (0.423580786 0.240174672 0.336244541)  
  2) flipper_length< 206.5 146  51 Adelie (0.650684932 0.342465753 0.006849315)  
    4) bill_length< 43.05 95   4 Adelie (0.957894737 0.042105263 0.000000000) *
    5) bill_length>=43.05 51   5 Chinstrap (0.078431373 0.901960784 0.019607843) *
  3) flipper_length>=206.5 83   7 Gentoo (0.024096386 0.060240964 0.915662651)  
    6) island=Dream,Torgersen 7   2 Chinstrap (0.285714286 0.714285714 0.000000000) *
    7) island=Biscoe 76   0 Gentoo (0.000000000 0.000000000 1.000000000) *

Summary

task = tsk("mtcars")
learner = lrn("regr.rpart")
resampling = rsmp("repeated_cv", repeats = 5, folds = 3)

rr = resample(task, learner, resampling, store_models = TRUE)
rr

Benchmarking

Compare multiple learners on a single task

Or multiple learners on multiple tasks

tasks = tsks(c("german_credit", "sonar"))
learners = lrns(c("classif.rpart", "classif.ranger",
  "classif.featureless"), predict_type = "prob")
rsmp_cv5 = rsmp("cv", folds = 5)

design = benchmark_grid(tasks, learners, rsmp_cv5)
design
            task             learner resampling
          <char>              <char>     <char>
1: german_credit       classif.rpart         cv
2: german_credit      classif.ranger         cv
3: german_credit classif.featureless         cv
4:         sonar       classif.rpart         cv
5:         sonar      classif.ranger         cv
6:         sonar classif.featureless         cv

Benchmarking

Benchmark experiments are conducted with benchmark()

Runs resample() on each task and learner separately

Collects the results in a BenchmarkResult object

bmr = benchmark(design)
bmr
<BenchmarkResult> of 30 rows with 6 resampling runs
 nr       task_id          learner_id resampling_id iters warnings errors
  1 german_credit       classif.rpart            cv     5        0      0
  2 german_credit      classif.ranger            cv     5        0      0
  3 german_credit classif.featureless            cv     5        0      0
  4         sonar       classif.rpart            cv     5        0      0
  5         sonar      classif.ranger            cv     5        0      0
  6         sonar classif.featureless            cv     5        0      0

Score Benchmark Result

$score() will return results over each fold of each learner/task/resampling combination

bmr$score()[c(1, 7, 13), .(iteration, task_id, learner_id, classif.ce)]
   iteration       task_id          learner_id classif.ce
       <int>        <char>              <char>      <num>
1:         1 german_credit       classif.rpart      0.270
2:         2 german_credit      classif.ranger      0.235
3:         3 german_credit classif.featureless      0.290

Aggregate Benchmark Result

$aggregate() returns the aggregated score across all resampling iterations

bmr$aggregate()[, .(task_id, learner_id, classif.ce)]
         task_id          learner_id classif.ce
          <char>              <char>      <num>
1: german_credit       classif.rpart  0.2660000
2: german_credit      classif.ranger  0.2400000
3: german_credit classif.featureless  0.3000000
4:         sonar       classif.rpart  0.2732869
5:         sonar      classif.ranger  0.1775842
6:         sonar classif.featureless  0.5384437

BenchmarkResult Objects

Collection of multiple ResampleResult objects

bmr$resample_result(1)
<ResampleResult> with 5 resampling iterations
       task_id    learner_id resampling_id iteration     prediction_test
 german_credit classif.rpart            cv         1 <PredictionClassif>
 german_credit classif.rpart            cv         2 <PredictionClassif>
 german_credit classif.rpart            cv         3 <PredictionClassif>
 german_credit classif.rpart            cv         4 <PredictionClassif>
 german_credit classif.rpart            cv         5 <PredictionClassif>
 warnings errors
        0      0
        0      0
        0      0
        0      0
        0      0

BenchmarkResult as Table

Convert to a data.table

as.data.table(bmr)

Visualize Benchmark Results

autoplot(bmr, measure = msr("classif.acc"))

Hyperparameter Optimization

Hyperparameter Optimization Loop

Learner and Search Space

Decide which hyperparameters to tune and what range to tune

as.data.table(lrn("classif.svm")$param_set)[1:12,
  .(id, class, lower, upper, nlevels)]
                 id    class lower upper nlevels
             <char>   <char> <num> <num>   <num>
 1:       cachesize ParamDbl  -Inf   Inf     Inf
 2:   class.weights ParamUty    NA    NA     Inf
 3:           coef0 ParamDbl  -Inf   Inf     Inf
 4:            cost ParamDbl     0   Inf     Inf
 5:           cross ParamInt     0   Inf     Inf
 6: decision.values ParamLgl    NA    NA       2
 7:          degree ParamInt     1   Inf     Inf
 8:         epsilon ParamDbl     0   Inf     Inf
 9:          fitted ParamLgl    NA    NA       2
10:           gamma ParamDbl     0   Inf     Inf
11:          kernel ParamFct    NA    NA       4
12:              nu ParamDbl  -Inf   Inf     Inf

TuneToken

to_tune() specifies the hyperparameter to tune and the range to tune over

learner = lrn("classif.svm",
  type  = "C-classification",
  kernel = "radial",
  cost  = to_tune(1e-1, 1e5),
  gamma = to_tune(1e-1, 1)
)
learner
<LearnerClassifSVM:classif.svm>: Support Vector Machine
* Model: -
* Parameters: cost=<RangeTuneToken>, gamma=<RangeTuneToken>,
  kernel=radial, type=C-classification
* Packages: mlr3, mlr3learners, e1071
* Predict Types:  [response], prob
* Feature Types: logical, integer, numeric
* Properties: multiclass, twoclass

Terminator

Terminator Function call and default parameters
Clock Time trm("clock_time")
Number of Evaluations trm("evals", n_evals = 100, k = 0)
Performance Level trm("perf_reached", level = 0.1)
Run Time trm("run_time", secs = 30)
Stagnation trm("stagnation", iters = 10, threshold = 0)

Terminator

trm("combo") allows to combine multiple terminators

trm("none") is used by tuners that terminate on their own

Terminator Function call and default parameters
Combo trm("combo", any = TRUE)
None trm("none")

Tuning Instance

Collects the information required to optimize a model

tsk_sonar = tsk("sonar")
instance = ti(
  task = tsk_sonar,
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measures = msr("classif.ce"),
  terminator = trm("none")
)
instance
<TuningInstanceBatchSingleCrit>
* State:  Not optimized
* Objective: <ObjectiveTuningBatch:classif.svm_on_sonar>
* Search Space:
       id    class lower upper nlevels
   <char>   <char> <num> <num>   <num>
1:   cost ParamDbl   0.1 1e+05     Inf
2:  gamma ParamDbl   0.1 1e+00     Inf
* Terminator: <TerminatorNone>

Tuner

There are multiple Tuner classes in mlr3tuning, which implement different HPO (or more generally speaking black box optimization) algorithms

The trn() function is used to create a tuner

Tuner

Basic algorithms

Tuner Function call Package
Random Search tnr("random_search") mlr3tuning
Grid Search tnr("grid_search") mlr3tuning

Tuner

Adaptive algorithms learn from previously evaluated configurations

Tuner Function call Package
CMA-ES tnr("cmaes") adagio
Generalized Simulated Annealing tnr("gensa") GenSA
Nonlinear Optimization tnr("nloptr") nloptr
Iterated Racing tnr("irace") irace

Tuner

More adaptive algorithms implemented in extension packages

Tuner Function call Package
Hyperband tnr("hyperband") mlr3hyperband
Model-based Optimization tnr("mbo") mlr3mbo

Control Parameters

Control parameters can be set, as with learners these are accessible with $param_set

tuner$param_set
<ParamSet(3)>
                  id    class lower upper nlevels        default  value
              <char>   <char> <num> <num>   <num>         <list> <list>
1:        batch_size ParamInt     1   Inf     Inf <NoDefault[0]>     10
2:        resolution ParamInt     1   Inf     Inf <NoDefault[0]>      5
3: param_resolutions ParamUty    NA    NA     Inf <NoDefault[0]> [NULL]

Triggering the Tuning Process

We can start the tuning process

Pass the constructed TuningInstanceBatchSingleCrit to the $optimize() method of the initialized Tuner

tuner$optimize(instance)
       cost gamma learner_param_vals  x_domain classif.ce
      <num> <num>             <list>    <list>      <num>
1: 25000.08   0.1          <list[4]> <list[2]>  0.2837129

The optimizer returns the best hyperparameter configuration and the corresponding performance

Result

The result is also stored in instance$result

$learner_param_vals lists the optimal hyperparameters from tuning, as well as the values of any other hyperparameters that were set

$x_domain field is most useful in the context of hyperparameter transformations

instance$result
       cost gamma learner_param_vals  x_domain classif.ce
      <num> <num>             <list>    <list>      <num>
1: 25000.08   0.1          <list[4]> <list[2]>  0.2837129

Logarithmic Transformations

cost = runif(1000, log(1e-5), log(1e5))

Logarithmic Transformations

exp_cost = exp(cost)

Apply Logarithmic Transformations

To add this transformation to a hyperparameter we simply pass logscale = TRUE

learner = lrn("classif.svm",
  cost  = to_tune(1e-5, 1e5, logscale = TRUE),
  gamma = to_tune(1e-5, 1e5, logscale = TRUE),
  kernel = "radial",
  type = "C-classification"
)

Analyzing the Results

The instance’s archive lists all evaluated hyperparameter configurations

as.data.table(instance$archive)[1:3, .(cost, gamma, classif.ce)]
       cost gamma classif.ce
      <num> <num>      <num>
1:     0.10 0.550  0.5674948
2:     0.10 0.775  0.5674948
3: 25000.08 0.100  0.2837129

Visualizing the Results

Visualize the results as a surface plot with mlr3viz

autoplot(instance, type = "surface")

Training an Optimized Model

We can use the best hyperparameter configuration to train a final model on the whole data

lrn_svm_tuned = lrn("classif.svm")
lrn_svm_tuned$param_set$values = instance$result_learner_param_vals
lrn_svm_tuned$train(tsk_sonar)$model

Call:
svm.default(x = data, y = task$truth(), type = "C-classification", 
    kernel = "radial", gamma = 0.1, cost = 25000.075, probability = (self$predict_type == 
        "prob"))


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  25000.08 

Number of Support Vectors:  205

Convenient Tuning

mlr3tuning includes two helper methods to simplify the tuning

We use the same components as before

tnr_grid_search = tnr("grid_search", resolution = 5, batch_size = 5)
lrn_svm = lrn("classif.svm",
  cost  = to_tune(1e-5, 1e5, logscale = TRUE),
  gamma = to_tune(1e-5, 1e5, logscale = TRUE),
  kernel = "radial",
  type = "C-classification"
)
rsmp_cv3 = rsmp("cv", folds = 3)
msr_ce = msr("classif.ce")

Tuning with tune

Creates a tuning instance and calls $optimize()

instance = tune(
  tuner = tnr_grid_search,
  task = tsk_sonar,
  learner = lrn_svm,
  resampling = rsmp_cv3,
  measures = msr_ce
)
instance$result
       cost     gamma learner_param_vals  x_domain classif.ce
      <num>     <num>             <list>    <list>      <num>
1: 11.51293 -5.756463          <list[4]> <list[2]>  0.1830918

Summary

lrn_rpart = lrn("classif.rpart",
  minsplit  = to_tune(2, 128, logscale = TRUE),
  minbucket = to_tune(1, 64, logscale = TRUE),
  cp        = to_tune(1e-04, 1e-1, logscale = TRUE)
)

instance = ti(
  task = tsk("pima"),
  learner = lrn_rpart,
  resampling = rsmp("cv", folds = 3),
  measures = msr("classif.ce"),
  terminator = trm("evals", n_evals = 100)
)
tuner = tnr("random_search", batch_size = 10)
tuner$optimize(instance)

Tuning with auto_tuner

Tuning with auto_tuner

Inherits from the Learner class and wraps all tuning components

Runs tune() when $train() is called

Then trains a model on the whole data with the optimal configuration

at = auto_tuner(
  tuner = tnr_grid_search,
  learner = lrn_svm,
  resampling = rsmp_cv3,
  measure = msr_ce
)

Tuning with auto_tuner

at
<AutoTuner:classif.svm.tuned>
* Model: -
* Parameters: list()
* Packages: mlr3, mlr3tuning, mlr3learners, e1071
* Predict Types:  [response], prob
* Feature Types: logical, integer, numeric
* Properties: multiclass, twoclass
* Search Space:
       id    class     lower    upper nlevels
   <char>   <char>     <num>    <num>   <num>
1:   cost ParamDbl -11.51293 11.51293     Inf
2:  gamma ParamDbl -11.51293 11.51293     Inf

Nested Resampling

Nested Resampling AutoTuner

The inner resampling is a 4-fold CV and the outer resampling is a 3-fold CV

Pass an AutoTuner to resample() or benchmark() to start the nested resampling

at = auto_tuner(
  tuner = tnr_grid_search,
  learner = lrn_svm,
  resampling = rsmp("cv", folds = 4),
  measure = msr_ce,
  id = "svm"
)

rr = resample(tsk_sonar, at, rsmp_cv3, store_models = TRUE)

Nested Resampling AutoTuner

rr
<ResampleResult> with 3 resampling iterations
 task_id learner_id resampling_id iteration     prediction_test warnings errors
   sonar        svm            cv         1 <PredictionClassif>        0      0
   sonar        svm            cv         2 <PredictionClassif>        0      0
   sonar        svm            cv         3 <PredictionClassif>        0      0

Unbiased Performance

The estimated performance of a tuned model is reported as the aggregated performance of all outer resampling iterations

rr$aggregate()
classif.ce 
 0.1828847 

Inner Tuning Results

Optimal configurations across all outer folds

extract_inner_tuning_results(rr)[,
  .(iteration, cost, gamma, classif.ce)]
   iteration      cost     gamma classif.ce
       <int>     <num>     <num>      <num>
1:         1 11.512925 -5.756463  0.1955882
2:         2  5.756463 -5.756463  0.1867647
3:         3 11.512925 -5.756463  0.2086134

Inner Tuning Archives

Full tuning archives

extract_inner_tuning_archives(rr)[1:3,
  .(iteration, cost, gamma, classif.ce)]
   iteration       cost    gamma classif.ce
       <int>      <num>    <num>      <num>
1:         1 -11.512925 11.51293  0.4485294
2:         1  -5.756463  0.00000  0.4485294
3:         1   5.756463 11.51293  0.4485294

Defining Search Spaces with ps

How to create a search space to tune cost and gamma

search_space = ps(
  cost  = p_dbl(lower = 1e-1, upper = 1e5),
  kernel = p_fct(c("radial", "linear")),
  shrinking = p_lgl()
)

Pass search space to tuning instance

ti(tsk_sonar, lrn("classif.svm", type = "C-classification"), rsmp_cv3,
  msr_ce, trm("none"), search_space = search_space)
<TuningInstanceBatchSingleCrit>
* State:  Not optimized
* Objective: <ObjectiveTuningBatch:classif.svm_on_sonar>
* Search Space:
          id    class lower upper nlevels
      <char>   <char> <num> <num>   <num>
1:      cost ParamDbl   0.1 1e+05     Inf
2:    kernel ParamFct    NA    NA       2
3: shrinking ParamLgl    NA    NA       2
* Terminator: <TerminatorNone>

Complex Transformations

Exponentiate cost and add ‘2’ if "polynomial"

search_space = ps(
  cost = p_dbl(-1, 1, trafo = function(x) exp(x)),
  kernel = p_fct(c("polynomial", "radial")),
  .extra_trafo = function(x, param_set) {
    if (x$kernel == "polynomial") x$cost = x$cost + 2
    x
  }
)
search_space$trafo(list(cost = 1, kernel = "radial"))
$cost
[1] 2.718282

$kernel
[1] "radial"
search_space$trafo(list(cost = 1, kernel = "polynomial"))
$cost
[1] 4.718282

$kernel
[1] "polynomial"

Summary

lrn_rpart = lrn("classif.rpart",
  minsplit  = to_tune(2, 128, logscale = TRUE),
  minbucket = to_tune(1, 64, logscale = TRUE),
  cp        = to_tune(1e-04, 1e-1, logscale = TRUE)
)

at = auto_tuner(
  tuner = tnr("random_search", batch_size = 10),
  learner = lrn_rpart,
  resampling = rsmp("cv", folds = 4),
  measure = msr("classif.ce"),
)

rr = resample(tsk("pima"), at, rsmp("cv", folds = 3), store_models = TRUE)

Sequential Pipelines

Sequential Pipelines

Workflows including data preprocessing, building ensemble-models, or more complicated meta-models

PipeOps are the building blocks

PipeOps are connected to form a Graph or pipeline

PipeOp

Short for Pipeline Operator

Includes a $train() and a $predict() method

Has a $param_set field that defines the hyperparameters

Constructed with the po() function

library(mlr3pipelines)

po_pca = po("pca", center = TRUE)
po_pca
PipeOp: <pca> (not trained)
values: <center=TRUE>
Input channels <name [train type, predict type]>:
  input [Task,Task]
Output channels <name [train type, predict type]>:
  output [Task,Task]

PipeOp Train

PipeOp includes a $train() and a $predict() method

The po("pca") applies a principal component analysis

tsk_small = tsk("penguins_simple")$select(c("bill_depth", "bill_length"))
poin = list(tsk_small$clone()$filter(1:5))
poout = po_pca$train(poin) # poin: Task in a list
poout # list with a single element 'output'
$output
<TaskClassif:penguins> (5 x 3): Simplified Palmer Penguins
* Target: species
* Properties: multiclass
* Features (2):
  - dbl (2): PC1, PC2
poout[[1]]$head(3)
   species       PC1          PC2
    <fctr>     <num>        <num>
1:  Adelie 0.1561004  0.005716376
2:  Adelie 1.2676891  0.789534280
3:  Adelie 1.5336113 -0.174460208

PipeOp State

The training phase typically generates a particular model of the data, which is saved as the internal $state field

The $state field of po("pca") contains the rotation matrix

po_pca$state
Standard deviations (1, .., p=2):
[1] 1.512660 1.033856

Rotation (n x k) = (2 x 2):
                   PC1        PC2
bill_depth  -0.6116423 -0.7911345
bill_length  0.7911345 -0.6116423

PipeOp Predict

This state is then used during predictions and applied to new data

tsk_onepenguin = tsk_small$clone()$filter(42)
poin = list(tsk_onepenguin)
poout = po_pca$predict(poin)
poout[[1]]$data()
   species      PC1       PC2
    <fctr>    <num>     <num>
1:  Adelie 1.554877 -1.454908

Graph

PipeOps represent individual computational steps in machine learning pipelines

These pipelines themselves are defined by Graph objects

A Graph is a collection of PipeOps with “edges” that guide the flow of data

Graph

The most convenient way of building a Graph is to connect a sequence of PipeOps using the %>>%-operator

po_mutate = po("mutate",
  mutation = list(bill_ratio = ~bill_length / bill_depth)
)
po_scale = po("scale")
graph = po_mutate %>>% po_scale
graph
Graph with 2 PipeOps:
     ID         State sccssors prdcssors
 <char>        <char>   <char>    <char>
 mutate <<UNTRAINED>>    scale          
  scale <<UNTRAINED>>             mutate

Graph

graph$plot(horizontal = TRUE)

Sequential Learner-Pipelines

Most common application for mlr3pipelines is to preprocess data before feeding it into a Learner

Learners as PipeOps

Learner objects can be converted to PipeOps

lrn_logreg = lrn("classif.log_reg")
graph = po("imputesample") %>>% lrn_logreg
graph$plot(horizontal = TRUE)

Graphs as Learners

To use a Graph as a Learner with an identical interface, it can be wrapped in a GraphLearner object with as_learner()

glrn_sample = as_learner(graph)
glrn_mode = as_learner(po("imputemode") %>>% lrn_logreg)

design = benchmark_grid(tsk("pima"), list(glrn_sample, glrn_mode),
  rsmp("cv", folds = 3))
bmr = benchmark(design)
aggr = bmr$aggregate()[, .(learner_id, classif.ce)]
aggr
                     learner_id classif.ce
                         <char>      <num>
1: imputesample.classif.log_reg  0.2330729
2:   imputemode.classif.log_reg  0.2330729

Configuring Pipeline Hyperparameters

PipeOp hyperparameters are collected together in the $param_set of a graph and prefixed with the ID of the PipeOp

graph = po("scale", center = FALSE, scale = TRUE, id = "scale") %>>%
  po("scale", center = TRUE, scale = FALSE, id = "center") %>>%
  lrn("classif.rpart", cp = 1)
unlist(graph$param_set$values)
      scale.center        scale.scale       scale.robust      center.center 
                 0                  1                  0                  1 
      center.scale      center.robust   classif.rpart.cp classif.rpart.xval 
                 0                  0                  1                  0 

Non-Sequential Pipelines

Non-Sequential Pipelines

Non-sequential pipelines can perform more complex operations

Using the gunion() function, we can instead combine multiple PipeOps, Graphs, or a mixture of both, into a parallel Graph

graph = po("scale", center = TRUE, scale = FALSE) %>>%
  gunion(list(
    po("missind"),
    po("imputemedian")
  )) %>>%
  po("featureunion")

graph$plot(horizontal = TRUE)

Common Patterns and ppl()

Many common problems in ML can be well solved by the same pipelines

ppl("bagging", graph) creates a bagging ensemble

ppl("branch", graphs) creates a branch

ppl("robustify") common preprocessing steps

ppl("stacking", base_learners, super_learner) creates a stacking ensemble

Branching

po("branch") creates multiple paths such that data can only flow through one of these as determined by the selection hyperparameter

use po("unbranch") (with the same arguments as "branch") to ensure that the outputs are merged into one result object

Branching

To demonstrate alternative paths we will make use of the MNIST (LeCun et al. 1998) data, which is useful for demonstrating preprocessing

library(mlr3oml)
otsk_mnist = otsk(id = 3573)
tsk_mnist = as_task(otsk_mnist)$
  filter(sample(70000, 1000))$
  select(otsk_mnist$feature_names[sample(700, 100)])

Branching

Do nothing po("nop")

Apply PCA po("pca")

Remove constant features po("removeconstants") then apply the Yeo-Johnson transform po("yeojohnson")

paths = c("nop", "pca", "yeojohnson")

graph = po("branch", paths, id = "brnchPO") %>>%
  gunion(list(
    po("nop"),
    po("pca"),
    po("removeconstants", id = "rm_const") %>>%
      po("yeojohnson", id = "YJ")
  )) %>>% po("unbranch", paths, id = "unbrnchPO")

Branching

graph$plot(horizontal = TRUE)

Branching

The output of this Graph depends on the setting of the branch.selection hyperparameter

# use the "PCA" path
graph$param_set$values$brnchPO.selection = "pca"
# new PCA columns
head(graph$train(tsk_mnist)[[1]]$feature_names)
[1] "PC1" "PC2" "PC3" "PC4" "PC5" "PC6"
# use the "No-Op" path
graph$param_set$values$brnchPO.selection = "nop"
# same features
head(graph$train(tsk_mnist)[[1]]$feature_names)
[1] "pixel1"  "pixel3"  "pixel22" "pixel32" "pixel34" "pixel38"

Tune Branch Pipeline

Branching can even be used to tune which of several learners is most appropriate for a given dataset

graph_learner = graph %>>%
  ppl("branch", lrns(c("classif.rpart", "classif.kknn")))
graph_learner$plot(horizontal = TRUE)

Tune Branch Pipeline

Tuning the selection hyperparameters can help determine which of the possible options work best in combination

graph_learner = as_learner(graph_learner)

graph_learner$param_set$set_values(
  brnchPO.selection = to_tune(paths),
  branch.selection = to_tune(c("classif.rpart", "classif.kknn")),
  classif.kknn.k = to_tune(p_int(1, 32,
    depends = branch.selection == "classif.kknn"))
)

instance = tune(tnr("grid_search"), tsk_mnist, graph_learner,
  rsmp("repeated_cv", folds = 3, repeats = 3), msr("classif.ce"))

instance$archive$data[order(classif.ce)[1:5],
  .(brnchPO.selection, classif.kknn.k, branch.selection, classif.ce)]

autoplot(instance)

Additional Features

  • Error handling with encapsulation and fallbacks
  • Parallelization with future, mlr3batchmark and rush
  • Advanced logging with the lgr package

References

Binder, Martin, Florian Pfisterer, and Bernd Bischl. 2020. “Collecting Empirical Data about Hyperparameters for Data Driven AutoML.” In Proceedings of the 7th ICML Workshop on Automated Machine Learning (AutoML 2020). https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_63.pdf.
Bischl, Bernd, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, et al. 2023. “Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1484. https://doi.org/10.1002/widm.1484.
Kuehn, Daniel, Philipp Probst, Janek Thomas, and Bernd Bischl. 2018. “Automatic Exploration of Machine Learning Experiments on OpenML.” https://arxiv.org/abs/1806.10961.
LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324. https://doi.org/10.1109/5.726791.